Identification of quantitative trait loci (QTL) and meta-QTL analysis for kernel size-related traits in wheat (Triticum aestivum L.)

Background Kernel size-related traits, including kernel length (KL), kernel width (KW), kernel diameter ratio (KDR) and kernel thickness (KT), are critical determinants for wheat kernel weight and yield and highly governed by a type of quantitative genetic basis. Genome-wide identification of major and stable quantitative trait loci (QTLs) and functional genes are urgently required for genetic improvement in wheat kernel yield. A hexaploid wheat population consisting of 120 recombinant inbred lines was developed to identify QTLs for kernel size-related traits under different water environments. The meta-analysis and transcriptome evaluation were further integrated to identify major genomic regions and putative candidate genes. Results The analysis of variance (ANOVA) revealed more significant genotypic effects for kernel size-related traits, indicating the moderate to high heritability of 0.61–0.89. Thirty-two QTLs for kernel size-related traits were identified, explaining 3.06%—14.2% of the phenotypic variation. Eleven stable QTLs were detected in more than three water environments. The 1103 original QTLs from the 34 previous studies and the present study were employed for the MQTL analysis and refined into 58 MQTLs. The average confidence interval of the MQTLs was 3.26-fold less than that of the original QTLs. The 1864 putative candidate genes were mined within the regions of 12 core MQTLs, where 70 candidate genes were highly expressed in spikes and kernels by comprehensive analysis of wheat transcriptome data. They were involved in various metabolic pathways, such as carbon fixation in photosynthetic organisms, carbon metabolism, mRNA surveillance pathway, RNA transport and biosynthesis of secondary metabolites. Conclusions Major genomic regions and putative candidate genes for kernel size-related traits in wheat have been revealed by an integrative strategy with QTL linkage mapping, meta-analysis and transcriptomic assessment. The findings provide a novel insight into understanding the genetic determinants of kernel size-related traits and will be useful for the marker-assisted selection of high yield in wheat breeding. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-022-03989-9.

yield is significantly influenced by thousand kernel weight (TKW), kernel number per spike (KNS), and spike number per unit area (SN) [3,4]. Of these, TKW has been selected as an essential trait in wheat breeding programs, due to its high heritability [5]. Kernel size-related traits, as one of the critical factors determining the formation of kernel weight, are mainly composed of kernel length (KL), kernel width (KW), kernel diameter ratio (KDR) and kernel thickness (KT) [6]. Larger kernels positively influence wheat seedling growth and significantly contribute to high-yield improvement [4,7,8]. Therefore, deciphering the genetic basis and finding functional genes for kernel size are critical for the enhancement of grain yield traits in wheat.
Previous studies reported that QTLs for grain size were generally mapped in large confidence interval (CI) with minor effects and are significantly influenced by different genetic backgrounds and environments, which limits the usefulness of these QTLs in wheat breeding programs [26]. The meta-QTL analysis is a robust method for the genetic analysis of complex traits by integrating QTLs from different studies to obtain stable genetic regions controlling a quantitative trait [27]. Compared to QTLs identified in a single study, MQTLs have the advantage of a smaller CI and a higher consistency under different genetic backgrounds. The meta-QTL analysis also facilitates the identification of candidate genes in a genome as complex as wheat.
MQTL analysis has been successfully applied in various crops, including maize [28][29][30][31], rice [26,32,33] and soybean [34]. MQTL analysis in wheat has also been effectively used to establish the consensus map of QTLs for many agronomic traits [35][36][37]. Previous studies integrated QTLs for yield and yield-related traits from published articles. They identified 12 significant MQTLs on chromosomes 1A, 1B, 2A, 2D, 3B, 4A, 4B, 4D and 5A, including two critical underlying genes, Rht and Vrn [38]. Tyagi et al. (2015) performed a meta-analysis of QTLs associated with kernel morphological traits and mapped 17 MQTLs on seven chromosomes in wheat [39]. In a previous study, a total of 2230 QTLs for yield and yieldrelated traits were used for meta-QTL analysis and 145 MQTLs were identified, of which 85 were verified by GWAS using different natural populations. Within 76 MQTL core intervals, 237 candidate genes involved in photoperiod response, kernel development, multiple plant growth regulatory pathways, carbon and nitrogen metabolism and ear and flower organ development were identified through searching for sequence homology and expression analysis [37]. Meanwhile,  performed a meta-analysis with 381 QTL related to yield and identified 86 MQTL and 210 candidate genes in wheat [40]. In addition to yield-related traits, MQTL analysis was also used to discover consistent QTLs and identify candidate genes for various quantitative traits such as leaf rust [41], drought and heat tolerance [42][43][44], salt tolerance [45] and disease resistance [46][47][48].
The present study used the inclusive composite interval mapping (ICIM) method to identify the QTLs controlling kernel size-related traits across seven environments. We performed a meta-analysis by combining the QTLs detected in our study with the 1071 QTLs from previous studies. Our main objectives were to: (1) identify stable QTL for traits related to kernel size in seven environments; (2) discover and map MQTLs from numerous reported QTL and current studies; and (3) identify candidate genes related to kernel size associated with MQTL intervals.

Phenotypic and correlation analyses
In the field trials conducted in seven environments (E1-E7), the parental line Q9086 had a significantly longer and wider kernel than Longjian19 (Table S1). In KT, the parental line Longjian19 had an advantage over Q9086. In the RILs population, all traits varied widely and had an approximately normal distribution with significantly transgressive segregation (Fig. 1). The coefficients of variation for KL, KW, KDR and KT ranged from 3.47% to 5.71%, 2.47% to 6.27%, 3.24% to 8.57% and 3.88% to 5.45%, respectively. The ANOVA of four kernel sizerelated traits revealed significant differences (P < 0.01) in the variation factors of environment, genotype, and genotype × environment interaction. Among the kernel size-related traits, KL (h 2 = 0.89) and KDR (h 2 = 0.70) were highly heritable, followed by KW (h 2 = 0.67) and KT (h 2 = 0.61) (Table S2).

QTLs identified under different water environments
In the present study, 23 QTLs for kernel size-related traits were detected under DS and WW environments (

Initial QTLs collection for wheat kernel size-related traits
By integrating 1071 initial QTLs from 34 QTL studies published between 2007 and 2020 (Table S4) and 32 QTLs identified in this study, a total of 1103 initial QTLs for kernel size-related traits were used for MQTL analysis (Fig. 4a). The distribution of initial QTLs significantly differed from homoeologous groups, subgenomes and individual chromosomes. For example, the number of identified QTLs ranged from 101 on homoeologous group VII to 241 on group II, and from 15 on chromosome 4D to 117 on chromosome 2D (Fig. 4b). Of the 1103 initial QTLs, 399, 433 and 271 QTLs were distributed among sub-genomes A, B and D, respectively (Fig. 4d). The CI ranged from 0.14 cM to 190 cM, with an average of 14.52 cM (Fig. 4c). The proportion of phenotypic variance explained by individual QTL ranged from 1.00% to 86.31%, with an average of 9.98% (Fig. 4c).

Candidate genes mining and expression analysis
We identified 1864 potential candidate genes in 12 core MQTL intervals, with the lowest (1) and highest (487) number of potential candidate genes in the MQTL7B. 4 and MQTL2A.2 intervals, respectively. The potential candidate genes within the regions of 12 MQTLs were screened and annotated based on IWGSC RefSeq v1.1 from the Chinese Spring wheat reference genome (Table S6). The GO terms associated with biological processes belonged to metabolic and cellular (229 and 210 potential candidate genes, respectively) pathways (Fig. 7). GO terms associated with molecular function were related to binding and catalytic activity (380 and 260 potential candidate genes, respectively). Regarding the cellular component, potential candidate genes were mainly related to the cell and cell part, with 130 and 128 potential candidate genes, respectively. KEGG analysis for potential candidate genes revealed that ubiquitin-mediated proteolysis and plant hormone signaling are the two most important pathways involved in the metabolic process (Fig. 8).

Discussions
Grain yield is influenced by the combination of kernel weight and number per spike [14,49]. TKW is not only one of the critical components of grain yield, but also is commonly used as a common factor for determining commercial value in wheat. Kernel size and shape, including KL, KW and KT, are strongly and positively correlated with TKW [50][51][52]. A bigger kernel positively affects wheat kernel weight, yield and commercial value [40,53]. Kernel size-related traits influence wheat yield by regulating TKW, and both are associated with high heritability [16,[54][55][56][57][58][59].
MQTL analysis is a powerful strategy for validating consistent QTLs by integrating independent QTLs from different trials on a consensus or reference map [27,73]. In the present study, a total of 1103 initial QTLs from previous mapping studies and identified in this study were performed MQTL analysis to identify key genomic regions linked to kernel size-related traits in wheat (Table S4, Fig. 4). As a result, 346 initial QTLs were finally refined into 58 MQTLs on chromosomes 1B, 1D, 2A, 3D, 4A, 5B, 5D, 6B, 7A, 7B and 7D (Table S5, Fig. 5). The average 95% CI of MQTLs (4.46 cM) was 3.26-fold less than that of initial QTLs (14.54 cM). The result was similar to previous MQTL analysis for grain yield and yield-related traits, where the average CI of MQTLs was 2.9-fold lower than that of the initial QTLs [37]. Most of the MQTLs in the present study controlled more than one trait, likely indicating either a tight linkage of genes or the presence of pleiotropic genes for controlling kernel size-related traits [37,43,48,73]. By the peak marker sequences compared with the wheat genome reference sequence of Chinese Spring, 58 MQTLs had definite physical positions and the physical intervals ranged from 1.54 Kb to 580.66 Mb (Table S5). Of these, six MQTLs, such as MQTL1D.2, MQTL4A.2, MQTL7B.1, MQTL7B.4, MQTL7B.5 and MQTL7B.6, showed narrower physical intervals (< 5 Mb), shorter genetic distance (< 10 cM) and more initial QTLs (> 2) (Table S5). These MQTLs are promised to be used in future marker-assisted selection for improving kernel size, and for isolating key genes by the map-based cloning approach in wheat.
Candidate genes related to important agronomic traits in wheat have been identified by MQTL analysis [37,40,43,74,75]. Nadolska-Orczyk et al. (2017) classified the genes controlling kernel yield into five categories:  [76]. Understanding the genetic and physiological pathways involved in grain development is of great help for investigating traits related to kernel size. In this study, we detected 1864 potential candidate genes in 12 core MQTL intervals with a physical interval of less than 20 Mb using the wheat genome reference sequence of Chinese Spring. Among 1864 potential candidate genes, 70 candidate genes were mainly expressed in the spike and grain at different developmental stages (Table 1, Fig. 9), consistent with those previously reported by Yang et al. (2021) [37]. In recent years, the analysis of homology relationships between wheat and rice facilitates the cloning of several yield-related genes such as TaFlo-A1 [77], TaCKX6-D1 [78] and TaTGW6-A1 [79]. In the present study, 17 out of 70 candidate genes homologous to rice genes were found within nine core MQTL intervals (Table 1). Of these, a key gene TraesCS3D02G024700 in the MQTL3D.1 interval was homologous to the gene OsCYP709C5 involved in regulating cytochrome P450 in rice [80]. Guo et al. (2021) also showed that constitutive overexpression of TaCYP78A5 significantly increased seed size and weight [81]. The ubiquitin-proteasome pathway has been associated with seed size development in wheat and rice. The corresponding genes, e.g., TaGW2-6A/6B [82,83] and OsUBC [84] have been cloned in wheat and rice, respectively. According to a previous study, carbohydrate metabolism is essential to yield and yield-related traits [76]. The gene TraesC-S7D02G149000 identified in the MQTL7D.2 region was homologous to the genes of OsSWEET15 in rice [85] and TaSWEETs in wheat [86,87], which were identified as the key gene involved in the sucrose transport pathway in rice [85] and floral development in wheat [76]. The gene of TraesCS7D02G149500 (MQTL7D.2) was identified as an orthologous gene of DPL1/2, involved in pollen hybrid incompatibility in rice [88]. In this study, the orthologous genes of DEP2, EP2 and SRS1 were found in the MQTL7B.1 region as TraesCS7B02G002900 and TraesCS7B02G003000, which was involved in regulating kernel size and yield [89,90]. In addition, the remaining 53 candidate genes were involved in various signaling pathways, such as zinc finger protein [91], transcription factors [17] and glycosyltransferase [92], which are also involved in the regulation of yield and yield-related traits.

Conclusions
In this study, we found that kernel size-related traits in wheat are predominantly regulated by genetic factors with moderate and high heritability. Most of stable QTLs were detected under both well-watered and droughtstressed conditions. Potential candidate genes expressed in spike and grain were identified through meta-QTL and  in-silico expression analysis. The markers closely linked to stable QTLs had great potential in the marker-assisted breeding program and the identification of candidate genes advanced the understanding of the genetic basis governing kernel size in wheat.

Plant materials and field trials
A RILs population consisting of 120 lines derived from the cross between two winter wheat cultivars, Longjian19 and Q9086 [93]. The male parent, Longjian19, released by the Gansu Academy of Agricultural Sciences, Lanzhou, Gansu, is an elite drought-tolerant variety widely grown in rainfed areas (300-500 mm annual rainfall) in northwest China. The female parent Q9086, is a high-yielding cultivar developed by Northwest Agriculture and Forestry University, Yangling, Shanxi, China. It is suitable for cultivation under conditions with sufficient water and high fertility. The two parents differ significantly from several physiological and agronomical traits, especially under rainfed environments [93][94][95]. only (designated E6 and E7, respectively). The two cropping sites are characterized by a typical dry inland environmental condition in Northwest China, where the annual average temperature is about 7.0 °C, the annual rainfall is less than 400 mm with approximately 60% falling from July to September, but the annual evapotranspiration capacity is more than 1500 mm. The two water treatments in different locations and years were conducted in field conditions without any rainout shelter. The DS treatments were equivalent to the rainfed condition in each growing season, whereas the WW treatments were irrigated with a water supply of 75 mm at the spike emergence (Zadoks 55) and grain filling (Zadoks 71) stages, respectively. Here, the decimal codes for the growth stages of wheat are described by Zadoks et al. (1974) [96]. In this case, the rainfall of the DS plots in each field environment was 164.3 mm (E1) to 296.5 mm (E7) (Fig. S1). All progenies and parents were sown in late September and harvested in early July of the following year. A randomized complete block After harvesting, two hundred seeds for each line were used to measure kernel length (KL), kernel width (KW) and kernel diameter ratio (KDR) with the SC-G wheat grain appearance quality image analysis system (Hangzhou WSeen Detection Technology Co., Ltd, Hangzhou, China). The kernel thickness (KT) was determined with a vernier caliper. All measurements were conducted with three biological replicates. The average values of the traits were used for QTL analysis.

Statistical analysis
Statistical analysis and analysis of variance (ANOVA) were performed using SPSS 22.0 (IBM Corporation, Armonk, NY, United States). According to the method described by Toker et al. (2004) [97], the broad-sense where σ 2 g , σ 2 ge and σ 2 e estimate genotype, genotype × environment interaction and residual error variances, respectively, and e and r are the numbers of environments and replicates per environment, respectively. The correlation among KL, KW, KDR and KT in the RILs population was also assessed.

Construction of linkage map and QTL analysis
For QTL mapping, a genetic map consisting of 524 SSR markers, described in a previous study was used [98]. These markers were distributed among 21 linkage groups and covered a total genetic distance of 2266.72 cM with an average distance of 4.33 cM between adjacent markers.
The inclusive composite interval mapping (ICIM) method was performed using the QTL software Ici-Mapping V4.1 to determine the positions and effects of QTLs [99]. QTL with LOD value ≥ 2.5, as determined by 1000 permutation tests at P ≤ 0.05, were declared for the presence of significant QTL. QTLs were named based on the International Rules for Genetic Nomenclature (http:// wheat. pw. usda. gov/ ggpag es/ wgc/ 98/ intro. htm). QTLs detected in at least three of seven environments were considered stable QTLs. QTLs for a trait identified with common flanking markers or overlapping CIs were treated as one QTL, with the CI reassigned by overlapping genetic positions.

Initial QTL collections used for MQTL analysis
A total of 1071 QTLs for KL, KW, KDR and KT traits derived from 36 bi-parental populations were retrieved from 34 published studies from 2007 to 2020 (Table S4). The size of the mapping populations varied from 99 to 547 lines of different types, including three double haploid (DH), seven F 2 and 26 RILs populations evaluated in different years and locations. The population information, including target traits, population parents, population types, and the number of markers used in the genetic map, was listed in Table S4.

QTLs localization on the reference map
A high-density map containing 7352 markers, including SSR, DArT, SNP and other types of markers, was used as a reference map in this study [75]. The total length of the reference map is 4994.0 cM with an average distance of 0.68 cM. The original QTL data and associated individual genetic maps from previous studies, and the reference map, were used as input files to create a consensus map h 2 = σ 2 g / σ 2 g + σ 2 ge /r + σ 2 e /re ( Fig. S2) and perform MQTL analysis with BioMercator V4.2.3 [100]. The position, chromosome groups, proportion of phenotypic variance explained (PVE or R 2 ), and logarithm of odds ratio (LOD score) were recorded for each of the QTLs in the 36 studies. The formula CI = 530/(N × R 2 ) for BC and F 2 lines, CI = 287/(N × R 2 ) for DH lines, and CI = 163/(N × R 2 ) for RILs lines was applied to calculate the 95% CIs of QTLs, where N is the population size and R 2 is the proportion of phenotypic variance explained of the QTL [101]. For QTLs without well-defined LOD scores and R 2 , these criteria were arbitrarily set at 3 and 10%, respectively. All collected QTLs with appropriate information were projected onto the reference map using BioMercator V4.2.3 [100]. The approach proposed by Goffinet and Gerber (2000) [27] was used when the number of QTLs per chromosome was ten or less, while the two-step algorithm was used when the number of QTLs per chromosome was higher than ten [102]. The Akaike Information Criterion (AIC) statistics were used to determine the best model for defining the number of MQTLs or "true" QTLs that best represent the original QTLs. The algorithms and statistical procedures implemented in this software are well described in previous studies [100,102,103].

Identification of candidate genes
To identify candidate genes, initially, the marker or its related primer sequences on both sides of the MQTL confidence intervals were manually searched using URGI Wheat (https:// wheat-urgi. versa illes. inra. fr), GrainGenes (https:// wheat. pw. usda. gov/ GG3/), DArT (https:// www. diver sitya rrays. com) and the Illumina company (https:// www. illum ina. com) databases. The obtained sequences were then aligned to IWGSC RefSeq v1.1 (https:// wheaturgi. versa illes. inra. fr/) to find the physical location of each marker. Candidate genes for this MQTL with a physical interval of less than 20 Mb were identified, and their associated functions were compared to choose the best possible candidates. The candidate genes were also investigated using Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses using Omicshare online tools (https:// www. omics hare. com/).

In-silico expression analysis of candidate genes
The transcriptomic data of several wheat tissues deposited in the Expression Visualization and Integration Platform (expVIP, https:// www. wheat-expre ssion. com/) were downloaded to study the in-silico tissue expression of candidate genes [104]. This included 18 tissues throughout the wheat growth period [105]. The expression levels of candidate genes were assessed by transcripts per million (TPM) and visualized using the heatmap of TBtools software (https:// github. com/ CJChen/ TBtoo ls/ relea ses).

Supplementary Information
The online version contains supplementary material available at https:// doi. org/ 10. 1186/ s12870-022-03989-9.  Table S1. Evaluation of the kernel size-related traits in RILs population and their parents under different environments. Table S2. ANOVA and heritability of kernel size-related traits in the Q9086/Longjian19 RILs population. Table S3. Summary of the QTLs identified for kernel size-related traits in all the environments in the Q9086/Longjian19 RILs population. Table S4. Summary of the QTL studies used for conducting MQTL analysis for kernel-size related traits in wheat. Table S5. MQTLs for kernel size-related traits identified in this study. Table S6. The information of candidate genes predicted within seven key intervals for stable QTLs and QTL clusters underlying kernel traits